ASYMPTOTIC BEHAVIOR OF SOME FACTORIZATIONS OF 

RANDOM WORDS 



PHILIPPE CHASSAING AND ELAHE ZOHOORIAN AZAD 

Abstract. This paper considers the normalized lengths of the factors of 
the Lyndon decomposition of finite random words with n independent let- 
ters drawn from a finite or infinite totally ordered alphabet according to a 
general probability distribution. We prove, firstly, that the limit law of the 
lengths of the smallest Lyndon factors is a variant of the stickbrcaking pro- 
cess. Convergence of the distribution of the lengths of the longest factors to a 
Poisson-Dirichlet distribution follows. Secondly, we prove that the distribution 
of the normalized length of the standard right factor of a random n-letters long 
Lyndon word, derived from such an alphabet, converges, when n is large, to: 

= piSi(dx) + (1 - pi)l[ 0yl )(x)dx, 

in which p\ denotes the probability of the smallest letter of the alphabet. 



1. Introduction 

First, recall some general definitions from |Lot831 lReu93j . Let A = {a±, a>2, ■ ■ ■ } 
be an ordered alphabet (di < 02 < ■ • •), finite or infinite, and let A n be the cor- 
responding set of n-letters long words. If w G A n , write w = w\ . . . w n and define 
tw = W2 ■ ■ ■ w n w\. Then (r) = {Id, r, . . . , r" -1 } is the group of cyclic permuta- 
tions of the letters of a word with length n. The orbit (w) of a word w under (r) is 
called a necklace. A word w G A n is called primitive if its necklace (w) has exactly 
n elements. In this case the necklace is said to be aperiodic. A Lyndon word is 
a primitive word that is minimal in its necklace, with respect to the lexicographic 
order. A word v is a factor of a word w if there exists two other words s and t, 
possibly empty, such that w = svt. If s (resp. t) is empty, v is a prefix (resp. a suf- 
fix) of w. A word v is a factor of a necklace (w) if v is a factor of some word w' G (w) . 

The standard right factor v of a word w is its smallest proper suffix in the lex- 
icographic order. The related factorization uv of w is often called the standard 
factorization of w. Both the standard right factor v and the corresponding prefix u 
(such that w = uv) of a Lyndon word are also Lyndon words. The standard factor- 
ization of a Lyndon word is the first step in the construction of some basis of the 
free Lie algebra over A, due to Lyndon |Lyn54| (see for instance |Lot83j or |Reu93j ) . 

On the other hand, according to |Lot83[ Theorem 5.1.5], 
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Theorem 1.1 (Lyndon). Any word w G A + has a unique factorization as a non- 
increasing product of Lyndon words: 

w = wiwi-i . . . W2W1, Wi € C, wi > wi-i > ■ ■ ■ > w 2 > wi, 

in which A + is the set of nonempty words on alphabet A and C the set of Lyndon 
words on this alphabet. 

If w is a Lyndon word, w\ — w, else w\ is the standard right factor of w. In this 
paper, we shall study the asymptotic behavior of probability distributions related 
to these factorizations. 



1.1. Random words and random Lyndon words. From now on, we consider 
a general probability distribution (pi)i>i on a set A = {a\ < a 2 < . . . } of letters, 
and we assume, without loss of generality, that < p\ < 1, i.e. the probability that 
a word has at least two letters does not vanish for n large. On the corresponding 
set of words, 



p(w) = Pe 1 ( w )Pe 2 (w) ■ ■ -Pe n ( w )- 
The weight p(.) defines a probability measure P„ on the set A n , through 

F n ({w})=p(w). 

V n (resp. N n , C n ) denotes the set of n-letters long primitive words (resp. its 
complement, resp. the set of n- letters long Lyndon words). Then we define a 
probability measure L„ on £„, as follows 



in which A„ = 1/P„(£„) = n/F n (V n ). The probability measure L„ has a trivial 
extension to A n (setting L„ (££) = 0). 

1.2. Main results. The sequence p( n \w) — (pi, n (w))i>i of normalized lengths of 
the Lyndon factors of a word w € A n , with Lyndon factorization w — wiwi-i ...wi, 
is defined as follows: 





we define the weight p(w) of a word w — a,e 1 ( w )ae 2 {w) ■ ■ ■ a £ n (w) as 



L n({^}) = X„p(w), 
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Thus pi t n{w) denotes the normalized length of the i-th smallesiQ Lyndon factor of 
w. Our first result describes the limit distribution, as n grows, of the sequence 
pW(w) = (pi,n(w))i>i, seen as a random variable on („4",P„). We have: 

Theorem 1.2. For a totally ordered alphabet with probability distribution p on its 
letters, p( n > converges in law, when n — > oo, to the random sequence p — {pi)i>\ 
whose law is defined by the law of p\ : 

p(dx) = piS (dx) + (1 - Pi)l(o,i] (x)dx, 

and the conditional law of pi given (p\, pi, . . . , Pt—i): 

{p\5 a {dx) + (1 - pi)l( ,i](x)dx ; y = 
Tz^l(o,i- y ](x)dx ; y>0, 
in which y denotes p\ + pi + ■ ■ • + p%-\ ■ 

In other words, if we set Sj = 1 — (pi + P2 + • ■ ■ + Pi), then s = (si)j>i is a Markov 
chain starting from 1 at time 0, with transition probability 

fptS^dx) + (1 -pi) l(o,i] {x)dx ; y = 1 
±l(p jy] (x)dx ; y<l. 

The process s is a variant of the stickbreaking process |McC65[ IPPY92) related 
to the Poisson-Dirichlet(0,l) distribution, in which the first attempts to break the 
stick would fail (with probability p\) and would produce a geometric number of 
fragments with size at the beginning of the process, while for the stickbreaking 
process the transition probability p(y,dx) is ^l^ y ](x)dx for any y 6 [0,1]. Of 
course, whence p^ and p are rearranged in decreasing order, the small initial 
fragments are rejected at the end or, in the case of p, they disappear. Thus 

Corollary 1.3. The decreasing rearrangement of converges in law to the 
Poisson-Dirichlet( 1 ) distribution. 

As regards the second result, for any Lyndon word w € C n , let R n (w) denotes 
the length of its standard right factor, and set r n — R n /n. We have: 

Theorem 1.4. For a totally ordered alphabet with probability distribution p on its 
letters, the normalized lengths r n of the standard right factor of a random n-letters 
long Lyndon word, when n — ¥ oo, converges in law to 

p(dx) = piSx(dx) + (1 - pi)l[ ,i)(a;)dx, 

where S\ denotes the Dirac mass on the point 1 and dx the Lebesgue measure on K.. 
As a consequence the moments of r n converge to the corresponding moments of p. 

For instance, if p is the uniform distribution on q letters, then the limit law of 
the normalized length of the standard right factor of a random Lyndon word, is 

p(dx) = -Si(dx) + - — -l[ 0tl )(x)dx. 



In this paper, when applied to words, or factors of words and necklaces (and, sometimes, to 
factors with special properties, that we will call blocks), the adjectives "small" and "large" refer 
to the lexicographic order on words, while "short" and "long" refer to the size (number of letters) 
of factors. 
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1.3. Context. The Poisson-Dirichlct family of distribution was introduced by King- 
man [Kin75;. This distribut ion arises as a limit for the size of components of decom- 
posable structures in a variety of settings, as shown by Hansen |Han94j or Arratia 
et al. |ABT99j . 

When the distribution p is uniform on q letters, i.e. 

1 

Pk = - ll<fc<o, 
q 

the combinatorics of the Lyndon decomposition have connections with that of q- 
shuffles [BD92 and of monic polynomials of degree n over the finite field GF(q), 
as explained in |GR931 DMP95 . When p is uniform, Corollary 11.31 is well known 
(cf. |ABT93i IHan94j ) . Actually, for a uniform p, a precise description of the size 
of Lyndon factors in term of the standard Brownian motion is given in |Han931 
ABT93 . Our contribution is twofold : 

• in Theorem 11.21 we give a description of the sizes of factors depending on 
their rank in the decomposition. Obviously, the order of factors matters in 
the Lyndon decomposition of words, while it does not for polynomials or 
for shuffles ; 

• the distribution p on letters is perfectly general (we only require more than 
one letter). As a consequence, to our knowledge, combinatorics do not 
provide closed form expressions for the distribution of sizes of factors. Thus, 
for a general p, we were not able to prove, or to disprove, the conditioning 
relation (cf. ABT03, p. 2]) which is usually required for convergence to 
the Poisson-Dirichlet distribution in such settings. 

Theorem 11.41 deals with random words conditioned to be Lyndon words. This 
Theorem is a first step, as were the papers [BCN05, MZA07 , towards the study of 
the Lyndon tree, that describes the complexity of some algorithms computing bases 
of the free Lie algebra on A. The line of the proof of Theorem 11.41 is the same as 
in [MZA07] , where the case q = 2 was obtained. The proofs of some Lemmas and 
Theorems in this paper are similar to their analogs in [MZA07 : in this paper, we 
only give the proofs that are significantly different from the case q = 2. We refer 
the interested reader to [MZA07 for more remarks and explanations. 

1.4. Sketch of proofs. Consider a nonrandom partition of [0, 1] into a (large) 
number of subintervals with small widths, k of these subintervals being marked. 
After a random uniform shuffle of these subintervals, the positions X — (Xi)\<i<k 
of the marked subintervals is close to a fc-sample of the uniform distribution on [0, 1]. 
More specifically, their Wasserstein distance is bounded by the maximal width of 
the subintervals, see Lemma l4~8l 

To use this principle, one has to build a factorisation (partition) of the random 
word such that : 

(1) the distribution of the random word is invariant under a random uniform 
shuffle of the factors (subintervals) ; 

(2) the length of the factors is o(n) (while the Lyndon factors are O(n)) ; 

(3) the marked factors are very small with respect to the lexicographic order, 
for they begin with large runs of the letter oi. 
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Thus, the marked factors, called "long blocks", are strongly related to the Lyn- 
don decomposition: they are prefixes of the longest Lyndon factors, and their po- 
sitions, approximately uniform according to Lemma 14.81 govern the lengths of the 
Lyndon factors. 

Section [2] is devoted to preliminary results on some statistics on runs. Specially 
useful is the observation that the length of the longest run of "ai" is typically of 
order log 1 , pi n. 

In Section [3l we describe the partition of a random word with length n into 
distinct "long blocks" with length of order log i n, long blocks that begin with the 
longest runs of "ai" ■ We have to make sure that, with a high probability, the 
Lyndon property is preserved by permutation of these blocks. 

Once these preliminary tasks are performed, we use the shuffling principle, 
Lemma 14.81 to prove the main results, Theorem 11.41 in Section 21 and Theorem 
Oin Section [SJ 

2. Number of runs and length of the longest run 

For w £ Vm let tt(w) denote the unique Lyndon word in the necklace of w. For 
a > 1, we set 




The next Lemma allows to translate bounds on P„ into bounds on L n : 
Lemma 2.1. For A C A n , we have: 

|L n (A)-P n (7T- 1 (yl)) |=0 

Note that ||p|| x = 1, and that, under the assumption {0 < pi < 1}, ||p|| Q is 
strictly decreasing in a. Among other well known inequalities, we shall make use 
of ||p|| 2 < y/m&xpi. We set 

/? = max{pi,l-pi}. 

For instance, the choice A = A n leads to 

|l-P n (£ n ) 1=0(11*0 = (/?"). 

Due to Lemma 12.11 the asymptotic properties of statistics, such as the number 
of runs and the length of the longest runs, that behave nicely under cyclic per- 
mutations, are the same on random words or on random Lyndon words, and the 
preliminary results needed under L„ and under P„ , for Theorems 11.21 and 11.41 are 
equivalent. 

Proof. This proof rephrases in probabilistic terms some results of |Reu93| Section 
7.1], to which the reader is referred for definitions. Let us define two sequences of 
subsets of A n , 

An,k = {w £ A n | 3v £ A h such that w = w n/fe | , 

V n ,k = An,k\ \J A, hi , 

\l<i<fc / 
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with probabilities i>k — Pn(A n ,k) and £u = Pn(P n ,k), respectively. Clearly 

A — A n V — V 

Also, if fc|n, {V n ,i)^ k is a partition of A n ,k (else, both A n ,k and V n ,k are empty). 
Thus 

d\k 

and, by the Mobius inversion formula, 

(1) £k = ^2n(d)v k /d, 

d\k 

in which fi(d) denotes the Mobius function. On the other hand, when k\n, 
Vk = ^ 

veA k 

- S (,„v..) Wlf '" r " 

/ , j r l — ft 
= IWIn/fe- 

Specializing (|T]) to fc = n. we obtain 

(2) P«(Pn) = $>(<*) ll*€- 

d | n 

Let the set of divisors of n be {1 < d\ < < • • • < di — n}. Then, by (|2j), 

|p„(Pn)-i+iwai < (^-i)iiPir rf i 2 

< (n-2)||p||^, 

if n is not prime. Else ^niVn) = 1 — Ibll^- ln any case, |P„('P rl ) — 1 + HpII^ | is a 
o (IMI^), and, since d\ > 2, 

(3) F n (T c n )=0(\\p\\ n 2 ). 
Lemma 12.11 is a direct consequence of 

1 ' P„(Pn) 

and of ©. □ 

Definition 2.2. Set £> = {0, 1}. Now, let (p denote the morphism, from A* to B* , 
that sends the letter a\ on the digit 0, any other letter of A on the digit 1, and any 
word w € A n on a word ^(w) € B n . We denote by N n (w) the number of runs in 
99(10), by Xi(u;),X 2 (w), . . .,X Nn (w) their lengths, by N { n e) (w) (resp. M^ e) (u;)) the 
number of runs of the digit e € 6 in the word (f(w) (resp. the maximal length of 
such runs). 
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Lemma 2.3 (Number of runs of the letter a±). 

P„(^X^^n) = 0(„-i), 

and 

Proof. In the case pi = 0.5, — 1 has a binomial distribution under P n (see 
[MZA071 Lemma 4.1]), but this property is lost as soon as p\ ^ 0.5. In this general 
case, we shall construct a random word t n {ui) in B n by truncation of an infinite 
word <jj on the alphabet B: 

In other words, w is a Bernoulli process with parameter 1 — pi. Let P denote 
the distribution of w, an infinite product of Bernoulli distributions with parameter 
1 - Pi- 

We set = 1 — Wi. For an element w of the (almost sure) subset fl of 

infinite words that does not end with an infinite run, let r)(uj) — (%(a;))j>i (resp. 
8{uS) = (9i(u>))i>i) denote the sequences of lengths of runs of the digit (resp. 1) 
in Li. Under the probability measure P, 

• £, r\ and 9 are independent, 

• rj is a sequence of independent geometric random variables with expectation 

• 9 is a sequence of independent geometric random variables with expectation 
Pi" 1 . 

• £ is a Bernoulli random variable with parameter p\ . 

The proof of Lemma 12.31 relies on the fact that the distribution of the prefix t n 
under the probability measure P is also the image of P n under tp (in other terms, 
t n and tp have the same distribution), and N„ will denote indifferently a statistic 
on tp(w) or on t n (uj). 

For k > 0, set S% = J^ =1 m and S 6 k = Y^=i 6 >- Thcn { N ^ ° *nM ^ k } holds 
if and only if 

{{£ = 1 and S2( W ) + S*_ x ( w ) > n) V {£ = and S^u) + S£(w) > n}} . 
Thus, by Chebyshev's inequality, 

V n (N^<k) < V(Sl + S e k >n) 

< Var(Sl + Sl) 
~ (n-E[S2 + S°]) 2 ' 

With the choice k = n, we obtain 

Pn (*£°> < ») = O (n-) . 

In order to obtain this result for L„, note that for a primitive word w, we have 
M 0) (w) - 1 < M 0) (tt(w)) < M 0) (w), then use Lemma O (see |MZA07l Lemma 
4.1] for details). □ 
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We also need some information about the length of the longest runs of "01" in 
a word w £ A n and in its necklace (w), for, among these long runs, the longest 
is bound to be the prefix of the smallest Lyndon factor of w, or the prefix of the 
unique Lyndon word in (w). Also, the second longest is bound to be the prefix of 
the second smallest Lyndon factor of w or the prefix of the standard right factor of 
the Lyndon word in (us). Furthermore, if Theorem 11.41 is to be true, there should 
exist at least two long runs and, if Theorem 11.21 is to be true, the number of these 
long runs should grow indefinitely with n, like the number of Lyndon factors of the 
random word. These points are consequences of Theorems 11.41 and 11.21 but they 
are also some of the steps of the proofs of these Theorems. They are addressed by 
the next Lemmas. In this paper e denotes a real number in (0, 1/2). 

Definition 2.4. (Long runs and short runs) We call long run (resp. short run) of 
w e A n a run of "<zi" with length at least (resp. smaller than) (1 — e) log 1 / pi n. 
We denote by H n (w) the number of long runs of "ax" m w. 

Lemma 2.5 (Number of long runs). 

F n (H n >an e ) = 1 - O (n" 1 ) , 

and 

L„ {H n > an e ) =1-0 (n' 1 ) , 
in which a is a constant smaller than El0_El2. 

Proof. We choose a positive constant a £ (o, , so that 

pi(l - pi) 

c = -a H > 0. 

4 

We assume that random words are produced the same way as in the proof of Lemma 
1231 We let, for i > 1, 

Bi = l{ t; ,>(l-e)log 1/pi n}- 

Then, for u e fi, 

(4) H n {t n {uj))> s 2i-i(w) ifw 1 = a , 

l<2j-l<A r , ( " ) -l 

and 

(5) H n {t n {u))> ifw 1 = 6. 

l<2i<AT^ 0) -l 

Note also that, under P, (-B;);>i is a Bernoulli process, and that its parameter 
p(n,e) satisfies n e ~ x <p(n,e) < n £ ~ 1 /pi. 

Thus relations (|4|) and (O, with Lemma 12. 3\ entail that, under P„, H n is, 
roughly speaking, stochastically larger than the binomial distribution with parame- 
ters Pl ^ 4 n and p(n, e). More precisely, if S Pl _ n ^ denotes a random variable dis- 
tributed according to the binomial distribution with parameters n n — pi( 1 ~pi)"~ 2 j 
and p(n, e), then 

P„ (H n < an') < P n (nW < Pl(1 ~ Pl) nj + P {S Pun , e < an') . 
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But by the inequality of Okamoto Oka58] IBolOlj , a binomial random variable S n ,p 
with parameters n and p satisfies : 

Pn {\Sn, P ~ pn\ >h)< (W ^ )V2 exp (~h 2 /2pqn) . 
As a consequence 

P (S Pu n,e < an £ ) < P (Sp un ,e ~ ^nPn.e < (an - 7T„) n 6 ^ 1 ) 

< P (S pi , n .s ~ K nPn . e < - (4cn - 6) rf^/A) 



< c n n s ' 2 exp 



2r ' 



in which 



limc„ = -pi)/2c. 



The first statement of the Lemma follows. For the proof of the second statement, 
we note that if w is a primitive word, 

(6) H n o n(w) >H n (w)-l, 

with equality when w begins and ends with long runs. Together with Lemma |2. 11 
it entails that 

L„ (H n < an e - 1) < P„ ({w G P n , if„ o tt(«;) < cm £ - 1}) + O (/3 Tl/2 ) 
< P n {H n < arf) + O (/3 n / 2 ) . 
and the Lemma follows. □ 

Recall that M„ denote the length of the largest run of non-ai letters. We have: 

Lemma 2.6 (Large values of the longest runs). Under P„ or L n , the probabilities 
of the events |m,1 0) > 21og 1/pi n| and jjW^ > 2 log 1/(1 _ pi) n| are O (n _1 ). 

Proof. First, we give the proof for 

An = {M^ >21og 1/(1 _ pi) n}. 

Again, we assume that the random words are produced the same way as in the 
proof of Lemma T2.3I For y > 0, we have: 

P„(M«<y) > P(Vie{l,...,n}, 0i<y) 

> (l-(l- Pl ) L ^ 



Choosing y = 2 logj rt — 1, we obtain that 

P„ (An) = O (n- 1 ) . 
Note that for a primitive word w, we have 

M n 1} o7rH = max{Mi 1) (w),(X 1 (w)+X Nn (w))l wl ^ ai ^ Wn } 
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Since P n is invariant under words' reversal, (Xi,wi) and (XN„,w n ) have the same 
probability distribution. Thus, from Lemma \2.1\ we deduce that 

Lb {A n ) < 2P„ (Xrla^ > log 1/(1 _ pi) nj + P„ (A n ) + O . 
which leads to the desired bound for L„ ( A n ) . Similar arguments hold for . □ 

3. Long blocks of words and good words 

We mentioned in Section [L~4l that the lengths of the Lyndon factors are governed 
by the positions of the longest runs of "ai", but it was a rough simplification: as 
already explained in [MZA07] , some of these longest runs have equal lengths so we 
need to compare longer factors beginning with these long runs, in order to decide 
which runs are the prefixes of the Lyndon factors. In this section, we prove that 
almost every word w £ A n has a large number of long factors, that we call long 
blocks, sharing three properties: 

Definition 3.1. The long blocks are the factors of w that: 

• begin with a long run of "ai", 

• end just before another run of "ai" (not necessarily the next run of "ai"), 

• have the smallest possible length larger than 1 + 3\og 1 / j3 n. 

Our main argument is valid only on a subset of A n , the set Q n of good words: 

Definition 3.2. A word w £ A n is a good word if it satisfies the following condi- 
tions: 

i. w has at least [an e \ long blocks, 
ii. the long blocks of w do not overlap, 

iii. if two long blocks have a common factor, its length is smaller than 3 logjy^ n, 

iv. for each long run of w there exists a long block beginning with this run, 
v. mL 0) (w) < 21og 1M rc, 

vi. M£\w) < 21og 1/(1 _ pi) n. 

It turns out that Q n has a large probability: 

Proposition 3.3. Under F n or"L n , the probability of is O (n 2e_1 log 2 n). 

For the proof of Proposition 13.31 we need a few lemmas: 

Lemma 3.4. Denote by E n the set of words w £ A n in which some [3 log-jy^ n] - 
letters long factor appears twice in the necklace (w), at two non- overlapping posi- 
tions: 

E n = (w £ A n \B(w',v,a,b) £ (w) x ^3 lo gl/(3 nl x s± w > = vavb \ 

Then, under P„ or L n , the probability of E n isO(n _1 ). 

A key argument of the proof of the main results breaks down if some long block 
of the decomposition of a random word is a prefix of another long block, somewhere 
else in the word. In order to preclude that, we shall consider blocks with at least 
[3 log x ip n\ letters (at least thrice the length of the longest run(sj3 of the letter ai), 
and we shall use Lemma T3. 41 



The probability that there exists several runs with the same maximal length inside a n-letters 
long random word is non vanishing with n large, so log 1 ^ pi n characters would be too short. 
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Proof. We have 

(7) P n (E n ) = O (n 2 /3 31og ^ ") = o (n" 1 ) , 

in which n 2 is a bound for the number of positions of the pair of factors of w, and 
/3 31 °Si//3" is a bound for the conditional probability that the second factor is equal 
to the first factor, given the value of the first factor and the positions of the factors. 
Due to Lemma [231 L„(£ n ) satisfies 

| L n (E n ) - P„ (ir- l (E n )) |= O (> /2 ) , 

and 7T- 1 (E n ) = E n C\V n C E n . □ 

Lemma 3.5 (Overlap of long blocks). Let F n denote the set of words w € A n such 
that some factor of (w), [Tlogj^ n] -letters long, contains two disjoint long runs. 
Then, under P„ or L n , the probability of F n is O [n 26 ^ 1 log 2 n) . 

Proof. The bound for V n (F n ) has a factor n for the position of the [flogj/^ nj- 
letters long factor, a factor 49(log 1 ip n) 2 for the positions of the 2 runs, and a 
factor n 2e ~ 2 for the probability of 2 disjoint runs at 2 specified positions. The 
proof extends to L„ by virtue of Lemma 12.11 □ 

Lemma 3.6. Let I n denote the set of words w £ A n whose suffix of length 
[6 log]^ n] contains a long run of "ax". Then, under P n or L„, the probability 
of I n is O (n 26 " 1 log 2 n). 

Proof. We have 

Pn(In) < reiogi/^n]. 
The factor [6 logjy^ n \ , that will be explained in the next proof, counts the number 
of positions where such a long run could begin. The factor n^ 1 = pi*- 1-6 ' 1oSi /pi n 
is the probability that a long run begins at some given position. The result for L ra 
follows from Lemma 12.11 Lemma 13.51 and 

Pn {^\ln)) < P„(F„). 

□ 

Proof of Proposition \3.3[ Consider the sets 

V n = {w G A n | w satisfies v. and vi. and H n (w) > an 5 } 

and 

Gn = V n \ (E n UF„U I n ) . 
Then, under P„ or L„, the probability of Q n is O (n 26-1 log 2 n), due to Lemmas 
123 1 |23 | [Q and 1331 Let us prove that Q n c G„. 

Consider a word w <E G n , and in order to prove that w satisfies conditions i. and 
iv., consider a /c-letters long long run of w, w = ta\ s. Since w ^ /„, [Slogj/^ n] 
characters after the beginning of this long run, we are still at least [Slogan] 
characters away from the end of the word w, and, since w satisfies condition v., 
this long run is over at this point: 

\t\ + k < \t\ + [31og 1//3 n] < n- [31og 1/(3 n]. 
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On the other hand, we are away from the end of the corresponding long block by 
at most [(1 — e) log 1 / pi n] — 1 + Mn \w) characters: the length of a short run of 
the letter ai followed by a run of the letter(s) aJl. But, due to condition vi., 

[(1 - e) log 1M nl - 1 + M« H < T3 log 1//3 n] , 

so there is room enough for the long block to end before the end of the word. Thus 
to each long run is associated a long block, and w satisfies conditions iv., and also 
i., since H n (w) > an 6 . 

Let us check that w € Q n satisfies the conditions ii. and iii.: a long block 
is shorter than [Slogan], due to conditions v. and vi., so it can overlap with 
the next long block only if the 2 corresponding long runs are contained in some 
\7 logjy^ n] -letters long factor, i.e. only it can overlap if w G F n . Finally if w 
satisfies ii. and fails to iii., then w £ E n . □ 

In the two following sections, we prove separately the main theorems, Theorem 
Oand Theorem 

4. Proof of Theorem 11.41 

First, let us draw some consequences of the definitions of the previous sections. 
Let H n (w) be the number of long blocks of a word w £ A n . 

Proposition 4.1. A good Lyndon word w € Q n H C n satisfies the following points: 

(1) each long block, by definition a factor of (w) , is also a factor ofw, 

(2) long blocks are all distinct, 

(3) there exists a smallest (resp. a second smallest) long block, 

(4) given a sequence of long blocks, {C,i)\<i<k, sorted in increasing lexicographic 
order, and any sequence of words, (t>i)i<i</c, the sequence (Ci^i)i<i<fc is also 
sorted in increasing lexicographic order, 

(5) the smallest of the long blocks is a prefix of w, 

(6) either the second smallest of the long blocks is a prefix of the standard right 
factor of w, or r n (w) = 1 — ~. 

Proof. Item (fTJ) follows from point iv. of Definition l3.2l Item ([2]) follows from point 
iii. of Definition 13.21 Item ([3]) follows from item ([2]) and from point i. of Definition 
I3.2[ as soon as [an e \ > 2, since H n (w) > [an £ \ . For items (j4]) and (j6j), it can 
be useful to remember a basic fact about the lexicographic order: if two words t\ 
and t2 have prefixes, respectively s\ and S2, such that s\ < S2, it does not entail 
that t\ < t2- However, under the additional condition that s± is not a prefix of 
S2, s\ < S2 entails t\ < t2- Thus item (j4]) fails only if some Q is a prefix of some 
Cj, i < j- But this would violate point iii. of Definition 13.21 As a consequence of 
the definition of Lyndon words, w begins with one of the longest runs of a\ in (w). 
This longest run is a prefix of some long block due to point i. of Definition 13.21 
This, together with item (|4]), entails item ([5]). 

For item (]6]), consider the two smallest long blocks, £i < (2, in the necklace 
(w) , and let k\ and &2 be the lengths of the runs they begin with: d = a\ x v\ and 
(2 = a\ 2 V2, in which the words Vi do not begin with the letter a\. We know that 

^from now on, wc do not need to use Bernoulli processes anymore, so, rather than discussing 
runs of O's and l's in <f(w), we shall get back to w and discuss, equivalently, runs of letters ai and 
Oi, in which "runs of ai" stands for "runs without any letter ai" . 
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w begins necessarily with a long run. Thus w begins with a long block, necessarily 
(1 , for £1 is not a prefix of any other long block (see the considerations leading to 
item @). The second smallest word in (w), W2 = t t w, begins with £2 or with 
a\ 1 ~ 1 vi, but, since a-^ 1-1 ^ or (2 are at least [Slogjy^ n\ -letters long, they cannot 
be prefixes of each other, due to point iii. of Definition 13.21 Thus r n {w) — 1 — — \i 
af 1- V < C 2 , and r n (w) = 1 - £ if a^ 1 v 1 > (2- □ 



By Definition 13.21 and Proposition 14. 1[ a good Lyndon word w £ Q n n C n has a 
unique decomposition 

W = /3l5l/3 2 ff2 ■ • ■ PH n (w)9H n (w), 

in which the ft's are a permutation of the long blocks Q's, now sorted with respect 
to their position inside w rather than in lexicographic order (but j3\ = £1). The 
gfj's fill the gaps between the ft's, and, if not empty, they begin with the letter a±, 
but do not end with letter ax- As a consequence, if not empty, gi has a unique 
decomposition 

3i = of aj 1 of" of" ...afafc, 
where r and all the exponents are positive. This leads to the definition of short 
blocks of good Lyndon words: 

Definition 4.2. The short blocks, denoted (sj)j, of a good Lyndon word w £ 
Q n H C n are the factors a\ m a\ m appearing in the unique decomposition of factors 
gi. As a consequence, any w £ Q n D C n has a unique &/ocfc-decomposition 

w = Y (w)Y 1 (w) . . .Y Kn{w) _ 1 {w)Y Kn{w) {w), 

in which the Y^'s stand either for a long block or for a short block Sj. 

Remark 4.3. Set fco = [(1 — e) log 1 / pi n\ . This decomposition of good Lyndon 
words can be seen as the decomposition of the elements of some submonoid of 
U a\° A*ai, containing Q n n L n , according to the corf^l n n defined below: 



n n contains any word afa^af 2 . . . a^ r such that r > f , f < k < fco, fcj > 1, 
li ^ I for 1 < i < r. 
• K n contains the elements t = uv of U a\° A*ai, with \u\ = [1 + 3 logj m n\ , 

such that t does not contain any factor a/a* , I ^ 1 (long blocks do not 
overlap), and such that, for I ^ 1, aia\ is not a factor of v. 

Remark 4.4. Note that the short blocks of some word w £ Q n ^C n have less than 

fc + 21og 1/(1 _ pi) n< 31og ( ^ A _j_ ) n 

letters, while the long blocks are not longer than 

2 + (6-e) log ( i A 1 } n. 

For a long block, count [2 + ^log-^/p n\ letters for the minimal size of a long block, 
plus eventually a run of "ai" (a short one, due to point ii. of Definition 13. 2i at most 
f— 1 + (1 — e) logi/ pi n\ letters long starting before the [2 + Slogjy^ n\ -limit) and 
a run of "ai", at most [~— 1 + 21og 1 /( 1 _ pi ) n~\ letters, due to point v. of Definition 



4 we understand a code as defined in Lot05 p. 7] , for instance. 
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When the factors of the block-decomposition are sorted according to the lexico- 
graphic order, the long blocks /3,'s turn out to be smaller than the s,*'s, since they 
begin with longer runs of "cti". By Proposition ^. 11 the smaller of all these factors 
is P\ = Y (w). Let J n {w) denote the index of the second smaller factor, and let d n 
denote its (normalized) position, defined by 

J„(«0-i 

(8) d n (w) = - £ \Yi(w)\. 

i=0 

If w € aiC n ^i (this happens with probability p\ + o(l), according to (|TU)) ), 

r n (w) = 1 - 1/ra, 

while if u; € £?„ n (£ n \ai£„_i), the second smaller block Fw.^), also a long block, 
is a prefix of the standard right factor, by Proposition 14. 11 and 

r n (w) = 1 - d n (w). 

When w £ Gn, both cases can be detected by inspection of the two smallest blocks. 
Let G n denote the conditional probability given Gn l~l C n : 

_ P w (A PI Gn PI Cn) = h n (A D G n D C n ) 

n[ ' r n (G n nC n ) h n (G n nC n ) 

Let Ua (resp. Ud) denote the uniform probability distribution on a finite set A 
(resp. the uniform distribution on [0, l] d , for a given integer d). As a first step in 
the proof of Theorem 11.41 we show that the distribution of d n under G n converges 
to Ui with respect to the Li- Wasserstein metric W2C, ■)■ 
The L%- Wasserstein metric W2C ■) is defined by 



(9) W2M = M E ' ~ 

C(X)=fj, 



\X-Y\\1 



in which and v are probability distributions on Mr, and ||.|| 2 denotes the Eu- 
clidean norm on Mr. Convergence of C(X n ) to C(X) with respect to W2(., •) entails 
convergence of X n to X in distribution (see |Rac91j ). The multidimensional case 
(d > 1) is needed for section [5] 

As in MZA07], the key point is the invariance of G n under uniform random 
permutation of the blocks \Yi(w), . . . , Yk„ (w)(w)}- 

Notations 4.5. Let & n denotes the set of permutations of {1, ...,n). For w S 

Gn H C n , and a € &K n Mi we se t 

a.w = Y (w)Y a(1) (w) . . . Y a(Kn ( w) )(w), 

and 

C(w) = {cr.w : a € Gjc n (w)}- 

Proposition 4.6. Assume that w G Q n fl £ n j arlc ^ w ' € C(w): then w' € 5 n H £„ 
and u>' /ias </ie same multiset of blocks as w (it has the same blocks, with the same 
multiplicity). As a consequence, for w,w' £ Gn H C n , either C(w) — C(w') or 

C(w)nC{w') = $. 
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This follows directly from Definition 13.21 and the definition of a code. Let C n = 
{C(w) ; w &G n n C n }, and let € n denote the c-algebra generated by C n . Also, let 
X(w) — (Xi(w)) i>0 be the sequence of blocks of w sorted in increasing lexicographic 
order, ended by an infinite sequence of empty words, and let = (Sj(u;)) i>0 be 
the corresponding sequence of lengths. 

Corollary 4.7. The weight p(.), X, H, H n and K n are <Z n -measurable, and 

Card(C*) p(C) 



E 



£<g£ -n{Gn H C n ) 



Given that w € C , the ranks of the blocks (-^i)i<,<if n (C) are uniformly distributed. 

Proof. The weight p(w) depends only on the number of letters oi, a 2 , . . .that w 
contains, not on the order of the letters in w, so that p(.) is constant on each 
C G C n - thus, under G n , the conditional distribution of w given that w G C 
is Uc- As a consequence of Proposition 14.61 C n is a partition of Q n n C n , so the 
relation in Corollarv l4.7l is just the decomposition of G n according to its conditional 
distributions given C n - Q 



Due to the previous considerations, the general result below can be applied, in 
this section, to prove the asymptotic uniformity of d n . 

Lemma 4.8. Let W^O,-) denote the L^-Wasser stein metric on R k . Consider a 
random partition of [0, 1) into t + 2 intervals ([ai(w), ^i(^)))o<i<^+iJ with respec- 
tive (non-random) widths (xi)o<i<i+f (Xi > 0, y\. Xj = 1), sorted according to a 
random permutation u> G &e, meaning that, for 1 < j < I: 

aj(cv)=x + Xi, b l (oj) = x + ^2 x i- 

is l<i<e, i: l<i<£, 

.Mi U (()<»M «»J"(i)<"(j) 

and 

[a (uj),bo(oj)} = [0,x ], [a l+1 (uj),b l+1 (oj)} = [l-x l+1 ,l]. 
Set fife = (<2i, . . . , a/s); 1 < k < I. Then 



\ 



i=l 



Proof. The proof is similar to the proof of [MZA07, Lemma 6.3], which is the special 
case k = 1 of Lemma As in the proof of M/AOTj Lemma 6.3], we rather define 
the random permutation uj and the sequence (aj) with the help of a sequence of 
i.i.d. uniform random variables (U±, . . . , Ue) : 



aj =x + 



■ i<«<«, 

nd UiKVa 



Among the many couplings between at and Ufc , this special one provides the desired 
bound on the Wasserstein distance. Actually, conditioning given Uj, 1 < j < k, we 
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obtain 

E [(Uj - aj f] 



E 



i 

[(1 - U ) 2 ] x\ + E [Uf] x 2 e+1 + E [U 3 (l - Uj)\ * 



| (x + Xf_ 



-ME 



We struggle with the idea that such computations are new. Actually the argument 
can be adapted (taking the Xj's in {0, 1/n}) to compute the L 2 distance t(l — 
t))/n between an evaluation F n (t) of the empirical distribution function and t, cf. 
[SW09I Ch. 3.1, p.85, display (3)]. □ 

We shall need the full generality of Lemma 14.81 in Section [5j In this section, 
we specialize Lemma 14.81 to k = 1. If v n denotes the distribution of d n under 
(Gn H C n , G n ), we deduce that: 

Theorem 4.9 (Position of the second smallest block). 



logn 



As a consequence, under G n , the moments of d n converge to the corresponding 
moments 0/U1. 



Proof. The proof of [MZA071 Theorem 6.4] holds step by step: if vc is the con- 
ditional distribution of d n (w) given that w € C, then vq is also the image of the 
uniform probability on & Kn ^ by the application a 1 — > d n (a.w). Thus Lemma 
and HO) lead to 



Then, Corollary 14.71 entails 



\ 



£3 



and we conclude with the help of a? < <Zj) x max^ aj, and of Remark l4.4l □ 



As in |MZA07[ Theorem 6.5], asymptotic independence between and d n holds 
under G„ : for a £ ra -measurable R- valued statistic W n with probability distribution 



(10) 



W 2 ((W n ,d n ), X n®Ul) =0 



log 



In order to prove Theorem ll.4[ let /Lt n (resp. /2 n ) denote the image of L„ (resp. of 
G«) by r„. Set 

£ n = (&» n £„) n ai£„_i = n n ai£„_i, and C? n = (Q n n £„)\£*. 
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We remark that: 

i. if w € C\, r n (w) = 1 — ~ holds tru^l ; 
ii. if w G £ 2 , r n (w) = 1 - d n (w) ; 

iii. when w £ C n \(Gn H C n ), the crude bound < r n (w) < 1 will prove to be 
more than sufficient for our purposes. 

First, the conditional law i>, given A, of a bounded r.v. X, defined on a proba- 
bilistic space O, is Wasserstein-close to its unconditional law v : if A is close to ft. 
More precisely 

(11) W 2 (^)<2P(OV4) 1/2 \\X\\ X . 

As a consequence, point iii., together with Proposition 13. 31 entails that 



W 2 (At„,£n) = o(n- 1/2+e logn) 



Thus we shall now work on I~l £ n , under G n , for /x„ has the same asymptotic 
behaviour as 

On Q n n £„, we have, according to points i. and ii., 

r„ = /„ (d„, 1 £ 2) = (1 - + M - ±^ (1 - l cl ). 

The £„-measurability of £ 2 (see [MZA071 Section 7] for more details) and relation 
(fTUl) entails that 



logn 



(12) W 2 ((l £ 2,d„), Xn ®Ui) = 0\\[' 

in which % n denotes the probability distribution of 1^2 . Thus, there exists a prob- 
ability space, and, defined on this probability space, a couple (W n ,U) with distri- 
bution Xn ® Ui, and a copjQ of ,d n ) whose L 2 distance satisfies 

||lz- -W n t + \\d n -U\\l = o( 1 ^ 

Set 

r n = (1 - t/)W„ + fl - ~Vl - W n ). 

The inequality 

|/« (d, w) - f n (tf, u/)| 2 < 2 (jd - df + |w - u/| 2 ) 
that holds for (w,w' ,d,d') € [0, l] 4 , entails that 



W a (/*n,r„) = O 



log n 



^Actually, r n (w) = 1 — — holds true if w £ a^Z n —\(a-k, dk+it ■ ■ ■ , On), but, since w £ 5 n , to 
contains at least one occurrence of the letter a\, which precludes w £ a k C„-i(a k , a k+1 , . . . ,a n ) 
for k > 2. 

^denoted (1 £ 2 , d n ) for sake of economy. 



18 PHILIPPE CHASSAING AND ELAHE ZOHOORIAN AZAD 

Finally, using an optimal coupling (w n , lf n j in which W n is a Bernoulli random 
variable with expectation 1 — pi, independent of U, set 

f„ - (1 - U)W n + ( 1 - - ] (1 - W n ). 



As above, we obtain easily 

W 2 (f ) < m(w n ,w n ) 



Also 

(1 - U)W n + (1 - Wn) = % + i(l - W n ) 

n 

has distribution /i. Thus 

W 3 (f n> A*) < -. 

n 

Now 

P„(a 1 £„_ 1 ) - P n {£ n \g n ) < P n (/£) < P„(ai£„_i). 
So by Proposition O and the fact that P n (£ n ) = i(l - 0{/3 n / 2 )), we obtain 

(13) \G n (L n )- Pl \ =0((logn) 2 r^" 1 ) 

and 

Wb(f„,r n ) = o(n" 1/2+e bgn) . 

With (HU), this yields 

W 2 (/i„,/i) = o(n- 1 / 2+£ logn) . 
Since < r n < 1, convergence of moments follows. □ 

5. Proof of Theorem 11.21 

According to Definitions 13.11 and 13.21 for w € Q n , the number of long blocks and 
the number of long runs are the same. Thus w has a unique decomposition 

w = g\{5\gif$i ■ ■ ■ PH n ( w )9H n (w)-, 

in which the ft's are long blocks and {gi)ie{i,H„(w)} and (9i)ie{2,...,H n (w)-i} are 
some words in A*. The factors g~\ and gn n (w) have a unique factorization : 

,9i = a^a^aX 1 . ..a^a^ := a\g x 

and 

9H n (w) — a l a x a x a x . . . a x a l a l .— gH n (w)a, 1 , 
in which u a±" denotes a run of fc letters that does not contain the letter "ai", and 
h, h! and all powers are positive or zero. Then if a factor g i; i^u.^Hniw)} is non 
empty, it has a unique decomposition 

gt = a\ a l a x a x . . . a{ % , 

in which r and all the exponents are positive. Now let us define the short blocks of 
good words : 

Definition 5.1. The short blocks, denoted (sj)j, of a good word w € Q n are the 
factors a{ m ai m appearing in the unique decomposition of factors gi. 
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As a consequence, any word w € Q n has a unique block-decomposition 

w = al {w) Yi(w) . . . Y K ^ w )_ 1 (w)Y K ,j w )(w)a^ n( - w \ 

in which k > 0, L n (w) > and the Yj's are either long blocks, or short blocks. Let 
Ji,n(w), 1 < i < H n (w), denote the index of the i-th smallest block of w £ Q n : 
since i < H n (w), Yj i n has to be a long block. Let di >n (w), 1 < i < H n (w), denote 
the normalized position of Yj i n (w) in w, dehned as the ratio |u|/|w;|, in which w 
has the factorization w = uYj i n (w)v. The normalized position di >n (w) is given by 
the formula : 

di, n (w) = ^ \ k(w) + l^-MI I ; i=l,...,H» 

For a word lo G A n , it is convenient to complete the sequence (dk,n(oj))i<k<H n (ui) by 
an infinite sequence of 0's. For a word lo £ G n , this is not much of a perturbation, 
since the original sequence is very long : according to Lemma 12.51 the probability 
that H n (uj) is smaller than an £ vanishes. 

Let G n denote the conditional probability given Q n : 

Gn{A) - P n (<7n) ' 

By arguments similar to those in Section |4j we obtain that for any k > 1 the 
sequence of random variables {di. n )i <i<k is, under G n , asymptotically uniform on 

[0,1] k . Once again, the key point is the invariance of G n under uniform random 
permutations of the blocks Yi : let a S &K! n (w) act on w by permutation of blocks : 

a.w = a\Y a(1) {w) . . . Y aiK , niw)) (w)a 1 ^' liw) . 

The action is slightly different from the action defined at Section |H for the decom- 
position is different, and in addition to the prefix a^, an eventual suffix a^"^ is 
also left untouched by the permutation. Let 

C(w) = {a.w : a G & K ' n (w)} 

denote the orbit of w under that action, and let £' n the er-algebra generated by 
C' n = {C(w) ; w € Q n }. The proof of the next theorem is similar of the proof of 
Theorem |4J] in Section [4] 

Theorem 5.2 (Positions of the k first smallest blocks). Let i)k,n — i v i,n)i <i<k be 
the distribution of d k , n = (di,n)i <i<k under (G n ,G n ). We have 



W 2 (%,„,U fe ) = C9 



So far we saw that the normalized positions of the H n smallest blocks are asymp- 
totically independent and uniformly distributed on [0, 1]. These H n blocks are the 
prefixes of the H n smallest words in the necklace of w. However, some of these 
H n blocks are not prefixes of Lyndon factors of w, but if the sequence i — > Ji, n 
is decreasing. For instance, in a word containing 9 long blocks with lexicographic 
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ranks going from 1 to 9, the blocks could be placed along the word in the following 
way : 

...4...8...3...5...7...1...9...2...6.... 

In this example, the long blocks which are prefixes of Lyndon factors of this word 
are the blocks 1, 3 and 4, those whose ranks constitute records of the sequence 
483571926 : the long blocks with ranks 9, 2 and 6 are immersed in the first Lyndon 
factor starting with Yj t n , and the long blocks with ranks 5 and 7 are immersed 
in the second Lyndon factor, that starts with Yj 3 „. Note that the largest (and 
shortest) Lyndon factors, that do not begin with long blocks, do not appear in this 
list of H n factors, but, as a consequence of Theorem 11.21 the total length of these 
largest factors is o(n) : once normalized by n, their length does not contribute to 
the asymptotic behavior of the factorization. 

By invariance of G n under uniform random permutations of the blocks Yi, the 
sequence of ranks of the long blocks is, conditionally given that H n = k, a random 
uniform permutation of Thus the conditional distribution of the number A„ 
of Lyndon factors obtained this way, given that H n = k, has the same law as 
the number of records (or of cycles) of a uniform random permutation in (see 
[ABT031 Ch. 1] or |Lot02[ Ch. 11]), with generating function 

"jfel 

J. 



- x(x + l)(x + 2) . . . (a; + k - 1) = - 

' o<j<fc 



in which 



is a Stirling number of the first kind. We can thus describe the 



k 

J _ 

conditional law of A„ as follows : consider a sequence B = (£?i)i>i of indepen- 
dent Bernoulli random variables with respective parameters B and H n being 
independent. Set 

S n = ^ Bi 

l<i<n 

and 

A„ = Su n = ^ Bi li<i<H„- 

i 

Then A„ and A n have the same distribution, and we shall use the notation A ra for 
both of them. The following lemma insures that with a probability close to 1, A„ 
is at least of order logrt. 

Lemma 5.3. 

P„(A„ <elogn/3H0 ( — ^ 

Vlogn, 

Proof. The m-th harmonic number has the asymptotic expansion 
m l 1 
2J l/i = H m = In m + 7 + 



2m 12m 2 

i—l 



in which 7 is the Euler-Mascheroni constant (see |CG96| ). We have 

n 

E(5„) = H n and Var(S„) = H n - ^ -. 



1* 

i=l 



By Lemma 12.51 : 

P„ (A„ < e log n/3) < P„ (A„ < e log n/3 | H n >an £ )+0 (n' 1 ) 
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In addition 

P„ (A n < e log n/3 | H n > an e ) < P„ (S an . < e Iogn/3) 

< P„ {\S an s - E(S an *)\ > elogn/2} 

- 

\\ogn 

in which the second inequality holds true for n large enough, and the last equality 
follows from the Bienayme-Chebyshev inequality. □ 

Let L denote a geometric random variable with parameter 1—pi, such that, for 
k > 0, 

P(£ = fc)=rf(l-Pi), 
and let U = {Uk)k>i be a sequence of independent random variables, uniform on 
(0, 1). Assuming that U and L are independent, let fi denote the law of the pair 
(L, U). We complete the sequences (dk, n )i<k<H n and (pk,n)i<k<H n by 0's in order 
to form the infinite sequences d n and p^ n '. There are three steps: 

(1) we use Theorem 15.21 to derive the convergence of (L n ,d n ) >1 to /j, ; 

(2) we prove that is the image of (L„, d n ) by a functional C whose domain 
of continuity Cc satisfies fx (Cc) = 1 ; 

(3) we check that the distribution of C(L, U) is conform to the description given 
in Theorem II. 21 

For step 1, thanks to [Kal93 Theorem 3.29], we know that weak convergence 
holds for the infinite sequences, if weak convergence of the distribution (under P„) 
of the finite sequence (L n , (c?i.n)i<i<fe) holds for arbitrary k. Again, as in [MZA07, 
Theorem 6.5], asymptotic independence between €' n and (c?i, n )i<i<fc holds under 
G„: for a ^-measurable R- valued statistic W n with probability distribution 



(14) W 2 ((W n ,(d i , n ) 1 < i < fc ),Xn«»U fe ) =0 

But it holds true, from the definition of L n , that L n , or W n = e~ Ln , are Im- 
measurable, i.e. invariant on each C'(w). 
Let 

Xn (resp. x) denote the distribution of W n under P„ (resp. the distribution 
of e~ L ). Due to Proposition 13.31 and to relation ([TT]) . 

W 2 (xn, Xn) (= W 2 (Xn ® U fc , Xn ® U k )) = O (rT^^lognj . 

Now, under P„, L n has the same law as L A n. This is perhaps clearer when one 
considers the word ZU obtained by reading the word w from right to left : clearly, 
under P„, L n defined by 

L n (io) = L n (uj) 

has the same law as L n , for u) and S7 have the same weight. But L n has the same 
law as L A n. Thus a.s. convergence of L A n to L entails that 

m ( X n, x) {= m (Xn ® u fe , x ® U fc )) = O (e"") . 

Weak convergence of (L n , (rfi,»)i<i<fc) follows at once. 

For the point 2, let T be the functional that shifts a sequence u as follows : 

T(u) = T(ui, U2,--.) = (1, Ul,U2, • ■ • )■ 
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Let S be the functional that keeps track of the sequence of low records (in the 
broad sense) of a sequence u of real numbers. The functional S is well defined and 
is continuous on a set of measure 1 of [0, 1] N , for instance on the set 1Z of sequences 
u without repetition such that liminfw = 0. Then the functional C defined on 
N x K by 

£{k, u) = T k o S{u) 

is continuous as well, and C(L n ,d n ) converges in distribution to C(L,U). 
Set 

Si,n = 1 - (Pl,n + P2,n H 1" Pi,n)- 

If L n (w) — k > 1, the first k factors of the Lyndon decomposition are k words 
reduced to one letter "ax" . Thus, for 1 < i < k, 

i 

n 

The next A n terms, Sk+i, n , su+2,n, ■ ■ ■ , Sfe+A„,n, are the low records of the sequence 
d n . The difference between the two sequences s n and C(L n , d n ) is thus 

C(L n , d n ) - s n = (i, |, . . . , 0,0, . . . ,0, Sfe+A„+l,n, Sfe+A n +2,nj • • •) • 

Endowing [0, 1] N with the distance 

d(u,v) = ^2- k \u k -v k \, 

k>l 

we obtain 

d{s n ,C{L ni d n )) < ^» +2 - L »- A ». 
2n 

This inequality and Lemma 15.31 entail that the d-Wasserstein distance between s n 
and C{L n ,d n ) goes to 0. Since L{L n ,d n ) converges in distribution to C(L,U), s n 
converges in distribution to C(L, U) too. 

For point 3, we note that C(L, U) and s have the same distribution. Further- 
more the transformation sending s n to p^ n \ and s to p, is bicontinuous. Thus the 
convergence in distribution of to p follows. 
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